NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

An Empirical Analysis and Resource Footprint Study of Deploying Large Language Models on Edge Devices

https://doi.org/10.1145/3603287.3651205

Dhar, Nobel; Deng, Bobin; Lo, Dan; Wu, Xiaofeng; Zhao, Liang; Suo, Kun (April 2024, ACM)

The success of ChatGPT is reshaping the landscape of the entire IT industry. The large language model (LLM) powering ChatGPT is experiencing rapid development, marked by enhanced features, improved accuracy, and reduced latency. Due to the execution overhead of LLMs, prevailing commercial LLM products typically manage user queries on remote servers. However, the escalating volume of user queries and the growing complexity of LLMs have led to servers becoming bottlenecks, compromising the quality of service (QoS). To address this challenge, a potential solution is to shift LLM inference services to edge devices, a strategy currently being explored by industry leaders such as Apple, Google, Qualcomm, Samsung, and others. Beyond alleviating the computational strain on servers and enhancing system scalability, deploying LLMs at the edge offers additional advantages. These include real-time responses even in the absence of network connectivity and improved privacy protection for customized or personal LLMs. This article delves into the challenges and potential bottlenecks currently hindering the effective deployment of LLMs on edge devices. Through deploying the LLaMa-2 7B model with INT4 quantization on diverse edge devices and systematically analyzing experimental results, we identify insufficient memory and/or computing resources on traditional edge devices as the primary obstacles. Based on our observation and empirical analysis, we further provide insights and design guidance for the next generation of edge devices and systems from both hardware and software directions
more » « less
Full Text Available
A Systematic Investigation of Hardware and Software in Electric Vehicular Platform

https://doi.org/10.1145/3603287.3651203

Suo, Kun; Vu, Long; Islam, Md Romyull; Dhar, Nobel; Nguyen, Tu N; He, Selena; Wu, Xiaofeng (April 2024, ACM)

In recent years, computing has been moving rapidly from the centralized cloud to various edges. For instance, electric vehicles (EVs), one of the next-generation computing platforms, have grown in popularity as a sustainable alternative to conventional vehicles. Compared with traditional ones, EVs have many unique advantages, such as less environmental pollution, high energy utilization efficiency, simple structure, and convenient maintenance etc. Meanwhile, it is also currently facing lots of challenges, including short cruising range, long charging time, inadequate supporting facilities, cyber security risks, etc. Nevertheless, electric vehicles are still developing as a future industry, and the number of users keeps growing, with governments and companies around the world continuously investing in promoting EV-related supply chains. As an emerging and important computing platform, we comprehensively study electric vehicular systems and state-of-the-art EV-related technologies. Specifically, this paper outlines electric vehicles’ history, major architecture and components in hardware and software, current state-of-the-art technologies, and anticipated future developments to reduce drawbacks and difficulties.
more » « less
Full Text Available
SwitchFlow: preemptive multitasking for deep learning

https://doi.org/10.1145/3464298.3493391

Wu, Xiaofeng; Rao, Jia; Chen, Wei; Huang, Hang; Ding, Chris; Huang, Heng (December 2021, the 22nd International Middleware Conference)

Full Text Available

Search for: All records